DbCUtffiMT - RESUME 



ED 243 904 

TITLE * 

INSTITUTION 
SPONS AGENCY 
PUB DATE 4 
NOTE 
WTB TYPE 

EDRS_PRlci_ ' 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



TM 830 842 



A;:chivihg Methbdology. Volume I: Project Officer's 
(Juide. _ L _ f ' * ' ; _ 

Leih^and (C.M. ) Associates^ Inc. , Newton/ Ma&s. 
National Inat. of Education (E0), Hashington, DC. 
30 Jul 79 - • ' 

5dp'. ; For related documents , 'see TM 830 843-845. 
Reports • Jlesearch/Technical (143) 

¥ff01/PC02 Plus Ppst^^^ _ __ _•*_ 

*Archives;- *Dat^abafest *Data Collection; ' *Delivery 
Systems; Diffus^icwn ^(Cbimnuhicatabh) ; Dpcumehtatibh ; 
*FederalGbverhmeht; Guideline 
Dissemination; information Utilization; *Mbde£s 
♦Secondary Analysis - 



Recently, t'o encourage secondary analysis, the 
federal government has begun to arrange for public policy^'data to be 
documented^ archived and released to the public. The purpose of this 
document is to provide gbvernmentprbject officers with guidelines 
for archiving gbverhmeht-^spdhsbred- data files. The guidelines 
represent a model fbr systematically transferring data f^bm the- 
original data collection cbhtractbrs to the public domain in a form 
amenable to secondary an»lys model has four stagess (1) 

establishing requirements , policies^ and procedure? to facilitate 
data archiving; (2) deciding whether a specif ic data *Set will be_ 
archived; (3^ creating an archived data set; and {4| transferring th^ 
data to a consortium which will maintain and disseminate them. 
(PN) * 



- . • 

J* ---^ 1^ 

***************************************************** 

* Reproductions supplied by EDRS are the best that can be made ^ 

* from thi original documen C. * 
*********************************************************************** 



-7 



iJS, OEPARTMEIWT OF EOUCXtlON 
•^ATlONAt I^CTIXUTC O^^ 
EDUCATIONAL RESOlJRi:ES JNFORMAtjON 
_ .__ _.. CENTER (£RIC) ^ 
V This dcKTumem has fc«eri. reproduced as 
received from the periqp 't>r organijatioh 
originating rt. i 
MLNor Changes have been rrwde to.imjjfove 
_ reproduction quality! 



'?®"l^°."°^."'^5.«ssar^^v represent offtcial nIe 
position or polKTV- 



AUCHIVING METHODOLOGY 
VOtSfffi f: PRbJECf OFFICER'S GUIDE 



Submitted tp ,^ " ; 
National Institute pf Education 
by ^ . 



6iMi teinwand Associates, Inc. 
July 3b, 1979 



CONTENTS 



INTRaDBCTION 



^AGE ONE: ESTABLISHING POLleiES 



• • ^ 

11. STAGE TWO:. DECIDING WHAT TO ARCHIVE 

A. Backgrbuhd * \ ^ 

• So D^^cri^tibn of Evaiuation Criteria ^ 

' y / ' j. Scope ot Data Set 

2. Sfe'tidy Design 

3. Topicli Area ' ^ - 

4. Pubji^ interest -* 

5. Quaiit^ of Data * 

6. Suimarf of Ds^tarYieid Criteria 

C. Archived Data Sets: Contents and ^ Levels ; Support 

1. File Structuring ^ • • 

' 2. Archive Doctamehtaticm . 

^ _ - : J ' 

DT Data Archiving Advisory Committee 

1 . V Standing^ In-House Ccsnmittee • . . 
.2.' Ad »Hbc In-House Committee 

3. Standing Extramural Committee ' ' ' 

4. Ad Hoc Extramural Committee 



IIL. STAGE THREE: CREATING THE DATA ARCHIVE 

. A. Data Cblleetibh arid Analysts Contract Hbdificatibhs 
* Archive Contract • ' , , 

e. Review of Archive Deliverables . . . 

» - - 

1. Tape Review for All Data . 

2. Checking the Dbcumehtatibh 

3. Preliminary Documehtatioh Checks' 

« m * 

iv. STAGE FOUR: RELEASE AND DISSEMINAffON 



1; Bnited jBtate^jArchiyes: Machine 

:Readable Archives 'Division _ : 
2/ United States Department of CiDmerce: 
. National Techhical Ihfbrmatioh Service 

i 

B. Private 'Organizations for Dissemination 

Inter- university Consortium for l^litical 
* and Social Research 

C. Recbmmehciatibhs ^ - ' 



FiSORES . , * ; ^ 

1 . Evaluation Form for Archiving Decisions 



3. . Bureau pf^Labor^ Statistics Local ftfea 
' ' Unemp^jraent Statistics - 

U. Key Word in Con'^ext tKWIC) 

5'. Conceptual Index 

5.- Comparison .of Four DftftC Models 

I ^ .Ik 



• pROjEcf Officer 's Auide to data abchiving: 

4 



Since the •Census of 1790, the federal gbvermneht has been collect ihg 
data for public policy purposes. As these data collecsticm actdvitie^'Mve 

SO has the potential for the use^pf the data themselves. -Recently, 
to encourage secondary analysis , the' federal government, has begun to .erraiige 

V ■ . :^ . . . /. . . > ^ . : 

for these data to be documented^ archi;»%d ahd released to the p;tii&l±c. This 



support has been prmptSd -by the ^recognition that data ^1 ikcticm is an ^ * 
expensive p'f'opositifcn and that it is in the public interest to mtximlze. the 
use of* data acquired with fed^al fundi. In Appendix: A we present a. detailed 
rationale for archiving and releasing data for, public policy research. 

The purpose of this document is to provide goveripent project officers 
with guidelines for archiving government-spoiisdred'^ da fil^s, Th#'guide-^ 
lines represent a jbdel for systematically transferring* data from the original 



data Qoilectton contifectors to the public, domain in a form amen able to 

The model has four stages: 1) establishing requirements^ policies ^ and- 
procedures to facilitate data archiving; 2) deciding whether a specific data 
set will be archived; 31) creating an%rchived data setj and i») ^ansf erring 
the data to a consortium which will maintain and disseminate them. JL 

Stage one of data archiving Cakes place entirely at the federal' • 
level. It entails establishing, policies ^d procedures for data archiving, 
including requirepents for data archiving /fn^requests for prpposals and contracts, 
ai^d establishing Ownership of 'the data. Stage tv7b occurs when a new » 
data ebllectibh project is initiated, the- focus of this stage is deciding 

\ ■ ■ ■ ' . . 



ERIC 



'IB- • . ^ . 

-r . - ■ -- ^ : • ^ 

• - ^ ■ - ' • 

'* ■■»• • . 

' ^ . _ * . i _ . ' . . . . . y ; - 

. whether to'&rchivei what to archive, atid how much effg^t'.to devote to ' 
archiving the data selected. After these decxsidiiB have been nj^de^ Stage 
rnree creating the archived data set and its associated doctiEentationi begins* : " 

. J . ; • . . . ' . ^ y . ' ' . ' .L«. . . 

Typitally^ ti)e d^ta, are archived through the interaction of two organizations: 

_____ ."^ _ ^ c 'ft ^ ' . - - ' ' 

the organization responsible f br the initial data collection; and analysis 



involved ini a study, and the data archiving organizatfo^ responsible for the 

, - " " \ ■ • ' : • - - ' ■ ^' 

preparation of the final user --level . dbcumentatiidn. The dis semination activities 

. * ' ; . __ __ __ 

of Stage Four involve storing and maintaining • phe data ±n a manner tha^ maximizes 

_ ■ • _ J . _ _ ^_ __. __ , • ^ 

their use, publicizing the- availability of the ^iata', and providing assistance 

to interested researchers. 

. • • _ t 



r 



' ii SfftQE ONE: ESTftBtiSHiMG PQLiCIES 



The ipriicary purooses of Stage One are to establisifi nrocedares for data 

i» '_' . - • : - . ■ 

atrehivihg Sid to inform all contractors, present and potential, that they 

shotiid be prepared to eoiDply with requirements established tc- facilitate 

data archiving. ' 

_■_ _ ' _ ^ I 

To meet. these objectives, all requests for proposals (RFPs) and contracts 

siiould specify that machine-readable data generated by government support - 

are^ to be considered in^ the public domain. This prbvisibh should ciear^ly * 

establish .federal ownership of the data and stipulate that jDrimary data • 

files deemed valuable fo^ secondary analysis are to be pl aged ^ in a repository 

designated by the federal agency within a reasonable period of ,ti^ ,A11 

data cbhtractbfs shbuld be prepared tb submit data tapes and dbcumentatibh 

with their 'final reports. RFPs ah^contracts ^hould also state that releasing 

and disseminating data is the responsibility of tlie sponsoring fed^al agency ^ 

not that bf thp primary irivestigatbrs br other data cbllectibh cbritractbrs. . 

Generally, current ^statutes and* judicial interpretation support the 

public nature of data' files coiiected at the government's expense. But, 

there are-^lso laws which protect the privacy bf individuals and orgahizatibhs 

stipplying the data. Bbth the RFP and the cbhtract must matce data collectors 

aware of these iaws^ * before ^ the collection process begins. In addition, it 

' - -- - - - Vy- ' - ' 

is necessary to develop an agreement which wiil satisfy archiving requirements 
and, at the same time^ cbmply with the* Freedbm bf Ihfbrmatibh Act^ the Privacy 

Act, and other related statutes. ^ _ ' • 

. / ___ _ » . 

Making such^ considerations prior to entering into a contract does not 

necessarily mean that the data a particulaK prbject produces will be archived. 

In fact since the whole concept bf data archiving is relatively new, it 



is iiiceiy that; in the near future, onJty a smaii percentage of the data eblieeted 
under' goverhneht cbhtra'cts wlfi.i be archivedi Deciding which data to' archive 

- ■ V 1 ' . ' ' - 

is the focus of Staged Twor ^ ' ^ 



■ I- 



t ■ 



s . 



s * 



ERIC 



II. STAGE TWO* DECIDING WHAT TO ARCHIVE 



A. BACKGRdUr© ' / , ' 

Sitice a multitude of research contracts are awarcjed annually by -ideal. 
State, arid federal agencies^ clearly^ there is- no dearth of studies that; 



could be considered for a data archive- As attractive as this may appear 

to the archivist and to those interested in tnaximisiing ^ the potential of * 

. . • , - — ' - . - V. . - 

existing data^ the project bffieer is left vith the task of evaluating ^the* 

worth of the data for archival puf poses. .Sareful evaltiatiiori of a study arid 

^ _______ _ _ * _ . _ _ _ L _ ^ 

its d^ta before electing to have theis archived is essential to the creation ^ 

_ _ ' - _____ * _ _ 

of a useful archive. However^ deciding which ones should be archived is not 

_ ______ _:? _' _ _ _ _ '_ __ ' " 

ari easy task.^ in part, this is due to the large riumber of studies funded 
(each of which usually generates multiple files) and partly due tto the wide - 
range of content ericbmpassed by these studies. Decisions about a study's 
worth require riot, only an uriderstaridirig of the su6staritive area and its 
methodological cjiaracteristics but also deiuand that the decisioS malcl^ have 
a stronj^ sense of whether or hot the study bindings will be* of interest to 

Others iri the. field in question. For example, the Division of Policy Research 

_ _ __ __ __ ___ . ._'__ : _ _ _ _ 

and Analysis p£ the National Science, Foundation (NSF) has funded research in 

such disparate areas a^ energy, innovatiiDn processes, the socioeconomic 

• _ 

effect^ ni Science arid techholog^i* and public policy related to science and 

___ " """"•""___'■_ _"• 

feichnoiogy. Tire National institute of Education (NIEJ has sponsored research 

in such diverse areas as coispensatDry education^ School finance, career educa- 

.tiorir biliriguaj education, special educatibri^ ediicatioti f^r the handicapped, 

and contiriiiini educ'atiori. eo5riectiveiy , these research projects have gSne^rated 

» _ • ~.__ ... " .'. L_ __. 

thbusand-s of data files. If either NSF, or NIE were to archive ail the data 



files prbdij^j^d by its programs^ it wbuld be necessary to create a separate 

_ _ _ ^_ _ _ f__ _ _ _ • • 

division for the sole purpose of document ijig and archiving j these data 

projects. ' - ; '* 

It is iinpbrtaht to CTiphasize that not all Sata ar# equally *^raluable^ 

. in secondary analysis. To d^et ermine which data are most valuable ^ all data 

bases geirierated from projects, studies, and awards should be evaluated using 

specific criteria. These preliminary Evaluations will iftdicate whether a 

data base is suitable for, and worth, archiving and how much affbrt should 

._ _ __ _______ __ ___ «_ , 

be devoted to archiving it- We have developed a specific, set of indicators 

which -^cah hel| to ascertain if the time and* effort required to archive a 

_ Y _ _ __ _ _ - - i - - - ' 

particular data set is a wise investment." Ah excellent' method of determining 

the value of an investment is to consider its *'pay off" value. 'The "pay off" 

in this cas^ refers to how importantly, the study md its results cantrxbyte 

to science and pubH.c policy. / • 



ERIC 



. / 



. io 



r 



DESCRIPTION OF EVALUATION CRITERIA^ 
- fista archiving costs relatively little- in contrast to the expenses in- 
curred in initial data collection and analysts activities. Nev.ertheless^ 
cbhsiderable time arid effort are expanded to develop and disseminate a data 
file, Tq^ determine v*ether ^^tniata^^et should be archived', these expenditures^ 
are compared with^ the pot^tiat value or "yield" of the file to researchers i 
Indicators of "yield" are typically subjective, but can be measured. 

In this section^ we present and describe eyaluatibh criteria used to 

• _____ _ ■_ * 

help meastare, yield i Ap evaluation fbrai has been designed to help project-^ 

officers identify high-yield data sets for archiving. The criteria listed ^ 

'bh this form are not har^ and fast rules, but rather guidelines to sissist- 

project officers iti -fiakihg decisions abbut what tb archive. 'The forni;itself 

appears as Figure 1. A more detailed description of the criteria follows 

; Figure l, ^ _ 

- -• ^6 • _ * 

Ih leummary^ there are five criteria f br evaluating data 'yield : scope 

I t __ _ 

of data. archiving, study design, topical Srea* public interest^ aria. quality 

of data* Each criterion is rated on a scale, with a low score of "1" and a.: 

high scbre of f5." Although the scale is not weighted, a score of "1" on 

data quality would cause t.^e data set t6 be rejected^ regardless of scores - 

on other criteria, fhis is becausp faulty data creates serious problems 

. _ - 

in secondary analysis and is a poor basis for public policy' research. A 

■> * ' 

total score of 20-25 indicates that the data set is definitely worth archiving, 

15-19 -points indteate that the data is possibly worth archiving.- Data sets 

i- — - . - - - L ' X 

scoring JL ess than 15* points should not be archived. ^ 

Tb use the evaluation criteria arid .form correctly, it is important to 

understand fully the issues related to each criterion. In addition, these ' 

criteria should not be considered fully independent ajpasures. Hopefully, 



ERIC 



FIGURE 1 

EVftLtfftflQN F0RM - FOR AflCHiViNS DEeiSiONS * ' 

CRITIEHIA FOR DATA YIELD ' * SCORE 

• ' * , lowest ^ . , highest 

* 1 • Scope of Data 1 2 3 5 

Gui4eUne :, ir a_ data set has_a nation^ 
probability sample, the full five points should 
V be aisighed. One point should be assigned 

to studies employing convenience s^ples of ^ ^ ^ " - 

a narrow scopes such as ihdividiial cities^ 
school districts, or* families. 

•2. St|idy --Design ' 1 2 3 ^1 5 

GuWelines If a data set is part of a* long- 
term longitudinal sample, the fui points 

should be assigned. Ail cross-sectionai (one- . 
time) studies should receive a score of one. - - • . 

3- Tbpicjal Area ' ' , ' ^ 12 3^*5 

Guideline ; If a data set is of broad sub- ' ^ 

stantive focus and vdii be potentially . • 

useftfl in solving either scientific or policy' : 
questions, a full five points should -be _ ' y 

assigned* A score of one' should _be assigned 

to stiidies of very harrow topical interest. , - ? • 

A study of stratdspheric conditions in Walt ham, . : -* 

Massachusetts, from 1971 to 1978 would 5e ^ , ' . 

considered of low topical interest and v 
would be assigned a score of one. Inter- 
mediate scores should be assigned to the * » 
degree >hat a . data set csah be used to 
answer current research and policy questions. 

H. Public interest ' ,1 2 3 ^ '5 

Guideline: If tH^ have been unsolicited ' 
requests for a givll^n data set by universities, 
policy makers I or scientistV and informal 
or formal gauges of public interest are 
• high, the full five peints^sh^ld be assi^etl, ' - . ' 

A score of one should be assi 'V 
' ^ in^ich rio requests have been made and no ^ - 

interest has been shown when public interest 
has been assessed.^ - 



i2 



> ' _ • _ _ _ - ' " SCORE 

CRITERIA FOR DATA YIELD . ' - • * . _ _' 

lowest. ; highest 



i-Of-Data , 1 - 



Ctii ^ellne ; Data evaiuatops mast very 
cautious t^en judging this criterion. 
If data quality is low ^ this criterion , \' 
will override other criteria and cause 
the data set to be ^ejected for archiving 
purposes^ A secure of one means that the data 
set ; can not be considered for secondary . 
analysis pur poses, regardless of: its ot>ier 
merits. - ' 



r 




A. score "of .five should be ^ssigried to. studies" • ^' 

which employ reliability aha^^ cbhsistehcy . ^ 

:^ checks J coding chec^i_and fia^ - 

collection plan. intenn?djtate scores shbuid ' ^ 

be assigned .based on the quality of consistency ^ 
^checks, data formatting, and data architecture. . ^ • . 

TOTAL seORE . * - >- ' 1-1*1 15-19' 20-25 

the sum of the criteria caa be used M^a| aggregate ^ • 

measure . whether data. should be archived or not 

The sbale_ranges from a low score of 5 to a high 

score bC_25. _A *data_se_t with a cumulative "score 

ranging* from' 20 to 25 should definitely be archived. ' i 

A score of 15 to 20 should be the J>asis for 

serioxisiy considering' archi A score of - / 

below 15 should be the. basis for deciding definitely 
not to archive. ' ^ . 



Note: It should be noted .that if the data set is 
scored "1" on data quality ^ it should definitely 
not be archi^^di regardless of ^ its cumulative 
score. 




• ■ ' _ _ ^ 

each will be totirrelated iif file which 'appears to have high data-yield 

potential - - - 

. . _ r 

1. Scope of Data Se t . ^ 

The scope- of tfie data iSet is typically the first evaluation criteria c5n- 

_ ' _ _ _ ^_ - _ . • _ - - - ^ - - - - 

^ sideredT^ It refers to the research pbpulatibh on- which the sample is ..based. 
Research populations can include ail Sidhigan Stat^ Ohlversity freshman » « 

. ' : _ _ __ - _ 

ail urban riots whiiDh occurred between 1968 and 1972 in the United States^ 

r 

all court accCTsibns for juvenile delinquents in the State of Washington,. 

or the adult popi^tion of Czechosiqvicia.s The scope is the largest ^ > 

o^^lJ^ividuais to whibh. statistical inferences 6an be drawn from the sample. * 

— . , i» r St - - -- _ y 

Scope is not>measured in teftns oT size, but rather, in terms of rep^ ^ 

reseht€Ltivehess . A sample of 1^500 respondents representative of the nation's 

employed mothers would be rahked high oS scope ,^ ^iie a sample of 10 » 606 

respondents from the city of Moose Jaw, Saskatchetmn, would be rated lotw 

bh scope. 

Data sets having a wide scope . are bf ten very ^iuabie f br archiving 
purposes. Exausples of such data sets are census tract projects, national 
election studies^ general social surveys^ and nationwide iipact studies of * 

^ i 

substantirve areas such as education, law enforcement^ and the utilization 
of science and technology. Because^ these studies are based on national pop-^ 
illations and hbt bh smaller mits. broad statistical Inferences can be made. 

For instance, a narrow scope project, au2Ch as a study of women in the Signal 

- - — _ - - _ - - _ _ _ * _ - _ _ ^ 

Corps of the U.S. £rmy in Europe, has a vei^ low sco|e rd^lng. Statistical 

geheralizatibns can hot be esctehded beybhd thia ilmlted group. Sh contrast, 

a sample of 1,500 respondents representative of the natibh^s voting population 

would have a hi^ 3cope ratliigi because ^s results could be^generallzed 

to describe all adult voters in the Dhifkd States*. 



As the cost of collecting primary data for nationally representative 
s.tudies Bas filler eased, the value, of wide^^scdpe samples has also increased. 
For example, the Department of Labor and the National Institute of Mental 
Healfen spehlf over $1 million to conduct their 1977 Quality of ^ploj^ent 
Surve^. Fewer and fewer research organizations will be able to conduct studies " 
of national scope, those whijh.^do so will tend to focus on broader issues, 
problems arid needs as costs continue to increase, ^ * . 

. ^- - . 

2 St ud y D e sl BTi 

A second criterion of dat^ yield is study design^ Two types' of survey > 

. J _ ' . _ . ^ 

designs are most often used in' social research: cross-sectional and longitudinal. 

__ __ __ _ ^ 

Cro ss -sectional studies are designed to look at phenomena from one point * 

in time. For example, Title I cbmperisatbry education services may be assessed 

during 1980 or the marital happiness of dual career couples may have been 

_ ._ J ___ 

assessed in a 1973 study. In longitudinal stud io , phenomtena are continually 
or periodically observed pver a length of time. For example, the Census 
Bureau might>lbbk at general trends in fer^^ility rates from 1956 to 1975 

^ 1 L __ _ _ 

or the Bureau of tabor Statistics migh^ look at labor force participation 



r*tes for females from 1966 to 1979 • Longitudinal studies trtxich use the=^ 

'same respondents and measure the same variables over a period of time are 

*f_ _ _ - _ __ _ _ _ 

• called panel studies . Fbrdhstahce, the National Opihioh Research Center 

, ' might have assessed the sime respondents' political attitudes in 1960, 195^1, 

1968, and 1972. 

Lbhgitudihal designs irivblvirig panels or repeat^ed observations oh other 
phenomena are vali^ble because they collect data for the analysis of change. . 
The 1977 Quality of Employment Survey which examined workers'* peffeptibns of labor 
standards, problins, job satisfaction, jbb stress ^ arid the meariirig bf wbr^k^ 

- - V - ' 

■ re-intervieWed a panel of , respondents who had also been interviewed in 1973; 

ERIC V 



The panel survey was particularly Jmpbrtaht because it allowed researchers 

^ . ■ 

to view social indicators over, time. ' ^ 

. - - ^ ■ ' ' . '■ - 

Like large- scale studies, longitudinal studies i especially those ihvblvihg 

r ■ - . " 

panels^ are expensive; few can be uhdertakeh without government supports 

Because of their ability to ^measure social change and the expenses associated. 

witfi their conduct, studies employing a' longitudinal design should be seriously 

...cbrisidered.for^.r^ public. . ' 

Al'fehbugh crbss-sectiofiai studies^^re less .powerful than the longitudinal 
*• ■ ' ». 

studies in assessing change, well-executed cross-sectional studies ma 

___ ^ _ — _ t - 

be worth archiving^ especially when they cbhcerh hew phehbmeha and trends. 

For example, the 1977 Quality, bf Empibymeht Survey also contained a cross-- * 

section of new respondeat whose responses were compared with thoise of 

the ^anel. The cross-sebtional portion of the study investigated many hew - 

tbpics, among fthem^^ empibymeht bf the respbhdeht's spbuse and the impaiet 

of both spbuses* wbrking on family life. Given 'the changes in 

sex roles, the tSmergence of the dual career family, and women's increased 

labor participation^ the cross-sectional- purvey contains data rich in potential 

fbr sebbhdary analysis. Another example bf'a vaius^ble cross-sectibhai. study * 

is the Safe School Study sponsored by the National Institute of Education. 

This study provides national estimates on an issue that had not previously 

been examined in depth: the extent bf viblehce in bur nation's schbbls. 



__ _ , # _ • _ 

Studies with a wi/le scopei and longitudinal design are hot alTOys of 

value tb secondary ahalys^ts. Ah'impbrtaht additibnal cohsideratibh in 

archiving is the topi cat or substantive areas a study covers. 



f 



9 



. While the' criteria^of scale and survey design lend themselves to 
objectivity and quant if icat ion ^ agsessing what might he of interest to 
secondary 'analysts entails malting judgeonents that ate generally more 
subjective.' In the field of sbciol'bgyi in the'1940s^ rural sbciolbgy was 
near the top of specialty areas. Studies of fertilizer dissemination 
and hbg-ebrn correlations fascinated sociologists and Statisticians of the 
period. But in the urban America of the 197&S| x^S^l sbciolbgy' is hot "a 
popular specialty area. Had macshihe-readable data oh hew corn hybrids of 



0 



the '^Bs been saved, t^ey might have iitife relevance for today's predominately 
urban society • * ^ ' 

Efrorts designed to collect data "to solve major aociai probienis or' answer 
imporfcaht sciehtific or policy questions should be reviewed in light of their 

^_ _ _ _ . _ k 

short- and long-term interest to researchers. 

:i . ^ ^ ' ■ • 

M. Public Interest ' ^ , 

* * — \. - - - • - . _^ - _ - . — - - _ — - - - 

Like^the topical area criteribn* public interest in specific data files 

_ i_ • -- - _ _ — _ -7 --- j_ _ _ 

may be difficult tb gauge. Clearly, a data file for i*ich there is public 
demand, such ?ts census data, should be seriously considered for archiving, 
even if it receives low ratings on other criteria. One obvious indicator 

of public interest is frequent^ unsblicited requests for data.* If m 

. r * . .. \ : . 

agency- receives many requests for compensatory educatioiii school finance ^ 

> r 

or' school ^oience data, the potential yield, is high. 6 
J \ . 

She way to ^uge public interest is to fox^ally or inforroliy survey 

researcher or universitjtsociaL science data centers which disseminate data. 

,1 



- - , - - ♦ - - - -* - i 

A moire f bna^l apBrbach would be, to publicize the potential availability of 

a data file prbfessional jdurhals and reijueBt inquiries frbm interested 

parties. In add4;tidn, a number of professional organizations of d^ta users ^ 

. - * - . . * 

have been formed th ^recent years which could be ^queried to provide an indica- 
tion of interest* Appendix B lists a humber-bf these brganizatibns and -persons ^ 
to -contact • w * . 

; • < . ■ 

5* goaltty of Sata \] (t . ' 

Data quality refers to the care taken to collect, cede, and fonaat data. 
High-quality data is cbnSistenfc with sufvey design, correctly coded, 



and properly formatted. The consistency of actual data_with tSe survey design- 
is a particularly important consideration. In a peuiel study designed to 
observe changes in fender roles over time|^id.nimlzihg sucl^ factbrs as loss 
of respohdehts would increase the cbnsist^ncy of, and cbhfidence in,- the data, 
If -60$ of, the original respondents w^e not re-interviewed, the survey '3 
representativeness would be lost. XJodtng and formatting outcomes are also 

- - _ _ _ - - _ : _ ^ 

indicators bf data quality. Fbr example, if 20^ of the responses to ah item 

_ ____ _ - 1 ■ * - : - V • 

asking if the respondent's spouse was employed were hot coded properly ^ this would 
raise serious doubts about the quality of the study. Although a study- could be iong- 
itudinal, of wide scope and great topical significance, and in public demand, 
its data yield potential could be undermined due tb: coding mistak^ made ' 
during 4he data collection .process, in such a case, the data would be com paratively 
unimportant for secondary analysis because bf their li^iited reliability. ^ 

It ist of course I ^ diffictEt to make* judgi&ehts bf data quality prior 
to data collect ion. One method of assessing data qiaality pf ibn to data collectibh 

i9 to assess^fae data collection contractor's experience. •^In oases where - 

_ _ ^ _ ' - - - - - -- '^.^ - ' • . 

experience is unknown or limited, it is important to (snphasize the lisportance 



of data documehtatioh and quality controls to potential data coHeetioh con- 
.traetorsi In some cases ^4 low-quality cfeta results from factors beyond the ' 
control of the data contractor. In the paSel study given as ah ejcaipie earlier, 

^ • ^ ^ _____ :__ M — < - 

high unemployment, divorce rates ^ or other external iCvents, might have caused ' 
•attrition of respondents. . 'V 4 

*r 



6. Jtmarsr of Data-Yield Criteria 

The iSbst important criteria for evaluating data sets are the scope of 
data coliectiojp, survey design, topical significance, public interest^ and 
data quality. The evaluation, sheet which' has been pr^ided gives the Project 
•Officer a summary of data yield criteria and can be* used fbr^any data set 

being considered for archiving, toy of the criterion can be ii?eightedy with ^he 

^ _ / _ 

possible -exception^ of data quality, depending upon the specific goals of the 

" • , 

agen]^ and the specific fuhctibh' the archive will serve. 

* • '■ ■' . 

C. ARCHIVE DATA SEtSi GONTSlTS AND LEVELS OF SUPPORT 

„ The decision to archive a single data .set or collection of data sets 

_ i _____________ _ _ __ , ■ • • ^ ■ ' 

dematnds that a second decision be tSade's-how ^ueh effort to devote to ^ 

archivihg. The work involved in archiving a, data file or a collection of 

files may range *frOT the .creation of simple ddc^entation to a more complex 

undertaking, consisting of reformatting data files and writing completely 

new "documents to describe the archive. To present, a full picture, we will 

discuss the components needed to* create the best data archive possible, descsribe 

the alternatives and options available within each cbmpbheht| and provide 

the rationa^ for choosing each alternative. Bsiig this infg^ation fa ( 

conjunction lath data-yield scores, project officers can make decisions 

about tvpe of archive to be created and the components to be incorporated • 

in it . , * - . 



i8 



^ The highest level of archiving support would contain all the cbmpbrieiits 

- K - ... • . ■ ■ ■ 

described. Below. The lowest would entail releasing' the data as they are 
received fronj the data collection contractor and providing cbpieis of any 
dbcUmeiitatibh available in the contractor's reports. - 



Our discussioil of the components is organized to reflect thfe two 

^ * 

major tasks involved in data archiving :,structurii|^ the files and developing^ 
the accompanying dbcumr tation. Each of these two areas are discussed in 
the next section.' ' / ^ 



1* Eile Structure , ; 

T** ■■ iii w i" ■ 1 1 i m il II ' II I 

The, preparation of data ' files .for archiving and reieasg to the piihlie 
focuses on two major concerns: ^ 

a. Ease of Use --^ . . i . 

b. ' Fiexibiitty • - . - ." 

_ : _ _______ . ; 

There are two pi*iinary strategies for dialing with these cbhcerns: 
•c. Orgahizatibh/structure ' . 

d. standardizatibh/recoding • j 

a» Goncern: "Ease of Use : , , ' _ . 

^ " . *■ ' . ^Si- 

Data files must be prepared iti a mhher that expedites their use by 

ah analyst involved in secondary research. DuriSg the initial data coliection 

and anfiiysis phases, the data files were prepared to fulfill the specific 

needs of the research project. 'Jhese heeds also dictated the decisions made 

oh file 6rga^^ati6h-dr coding. Nonetheless, the files — regardless of how 

they wet»e organized or coded—had to be analyzed to" fulfill the terms of 

the contract. However ^ Isecbndary analysts will be wbrlcihg under a significantly 

different set bf^ constraints ^ usually to fthswer a si^ificahtiy different 

research question. If the data files are difficult to access or analyze, 

_ ' . f 

these analysts may have to redisgn or table tHeir research. 

b. Concern: Flexibility * 

* . _ z' ' 

The data archive contractor faces an adaitional issue, that is, how 
tp maximize data's utility for further analytic purposes. This issue becbmes 
ciipecially important tSieh mbdiflcatibhs or recbdesvtb the data are planned 

as part of the archiving process. The ease of recoding of missing i^ues: 

« "~ • ' • 

helps illustrate the implications of the Issiie of flexibility. . 



2V 



In UiE^s Safe Schooi Stady data, missing value codes cooid have on? 
of .Six different values, .depending on the reason the data were missing, * 
One value ihdicsated "dbh'^t Rhbw^" another indicated "refusal to* answer the * 
question," i and still another indicated a "legitimate skip." (The reip'ondent 



should/not have answered the question and was ;*outed 'aroiind. it in the survey.) 
other values indicated other problems in the 4ata collection. . ftie existence 
of these six missing values caused difficulties when early versions o| the ' ^ 
, Statistical Package for the Social Sciences (SPSS) were used, since these ^ 
versions only allowed three independent missing values. Each time ah S^SS 

analysis run was made, the six missing value codes for csacfi variable had 

_/fc ' _ • _ • 

to be receded into three at most. . Erom most analysts' viewpoint, the file 

» - • " •■ 

_ * 1. 

would be easier to use if only one or; at most, three iniss^g values existed ' ' 
fori each variable. Unfortunately, some analysis might hot have beeh>p6asible 
unless the six missing value codes were differerltiatea; therefore, jLf these- 
variables were receded on the archive files, a potential analysis opportunfty 
may have been foreclosed.* 

c. Strategy: Organization/Structure * * , - 

' _ --_ - _^ _ ■ • _ . 

In any large-scale |data cbllectibh prbject^ .decisions about structuring and 

merging data files 'are ba4ed on the specific research Issuels to be addressed by * 

" _ _ l'_ __^ __ i . --JS »_ 

the analysis. (Sonsequentiy; each data file is organized in a manner consistent 

• ■ * 

with those needs. For example, a classroom observat'ibh study in which an 
observer coinpleted a data sheet for each ten-minute tifie period within a school 
day could be organized in two alternative structures: in the; first, the studeilt 
is the unit -^of analysis; in the second, the activity is the unit of analy&is/ 



» SPSS versions 7.0 and above allow a range of missing values to be used ^ making 
this particular point jnbot. At the time the S^fe School Study was conducted ^ * 
it was a ver^ ^esd issue. * - 



• - ' ift' tA4e fitst structute- aiternat^ye using the stuaeiit as the basic ^ 

unit 6i analysisi bni long record would be created fbr each student observed* ' 

atid aii't>f the' studejfct's activities wbul<i be contained in a separate var^bie. % 

Assuming that there were 25 ten-minute periods- in the school day and that 

the observers i^rked art activity code for e^ch period, the Record, would contain ^ 

25 'variables (a student identified and 25 p^iod variables). In t^je second 

alternative structure, ah activity wou^-d be tlie basic unit ^bf analysis. ' ^ 

' Each ten-minute period for each student would be recorded as a separate 

record in the file 'having three variables (the -student identifier, a ' . . 

■ ^^^^ - • 

code for the pjeriod fecorae'd, and ah activity code) : AithoughUhei first ■ 

structure is considerably mSre compact Ci.gB. .^it -^jfecupies. less computer ' _ 

space), the second is inore suitable to .answer such ^questions' as, 'Vhat , 

•is the most populai' activity?^ or "Cm'the avera^^ 

*» * ^ - , 

periods are speiit reading?^* . - - * ^ ^ ^ 

/ ' ■ ■ ■ ' _ _ _ ____ - -_- -__ 

. ^ StrucWi^iSE longiti^inal data poses a simtiar probl.^. Should the . . x 

: . . ■ ■ ' _* _ '_ _ . __ • * 

da£a collected in each lohgitudiSal period be treated as a separate" data 

^ecojrd? Or, should the data for all periods for each person be merged 

into one larger record? • - • 

The -final stracture-reiated issue tcf^he addressed deals with th6 

; • ^ ■ ^ • ^ ■ • ; 

/ appeaiaSce of datS at flifferent levels of analysis on the same data ^ • 

. ; ■ • • . ^ , . 

ftie;- Oliis issue is sometimes called the "hierarchical vs. rectangular 

argument.*' In; a rectangular file, each obseryatibh or record on the 
file is at the same level of analysis and .contains data on the same ^ ^ 

questionsi A rectangular file can be envisioned as a piede of graph 
- paper on which each "horizoatal line represents an bbservmen 

: ' • ' • - ■ ■ , * ■ , 

column xapresents a different questibh. When She data is viewed, it is s> - 

rectangular in shape. Although eiffjptrent data ±t^s may not appear on some- 

it • , - . • 

lines because hb rMponse was given, potentially ^ each line could confekin 
• ' • data for each coi^i , . * . 

ERIC . ' : ' ■ ' - 23 V ^ ' : . I 



' t ' \ • r _ - 

in hierarchical files, each ob^ervatidh or record i^'hbt jiecessajily 

> . "^ _ r _ _ _ ^ ^ 

at the siime leyei'of ihaiysis> nor dpes it necessarily represent identical 

. _ \ • 51-- . - - -_ ---- ^ i' ^ 

,data items, in addition, each record's length can be different, depending 

on the dita it might cotitain. * - 

. . ' , _ . . . . ___ I 

' the National Cfime Survey (NCS) data sets distributed by^the i5v 



*• Enforcement 'Assistance Administration is a prime 'Example of- hierarchical file 

\ " ."^S _ ^ 

organization.. In the NQS data- file, records .are located at four different 

^ » 

levels of observation. > ' 

- _ ^ : 

• Community records contain information about the .community in which ^ 

the survey was taken. . ^ 

9 Household records contain information about t^he household being 
.4.nterviewedi 

- • _ ' : ; ■ V 

i Individual records reference personal questions pertaining to each 
ihdivd/dual within a household. 



m- Incidmt records provide detalled^ information on each crime ttS^ 
occurred • : 



^ Each of these record types' is of a different length, and each contains 
fiifferent information, pie file is organized so that community records are 
followed by records^ij^ the first household' in ^e community^ then records 
of the first individual within the first household, and, finally, the first , 
individual's incident recb^. . The record for the second '.individual in the 
first houiehbld in the first community appears next; fbiloKed-by the second 
individual's incident records. After the recl|rds for all individuals in the 
.first household are completed, records begin for the second household. When 

ail hbuseKolds for that cammuhity are reported, another coxmnunity record begins 

_: ' . ' 

oon tape. _ : 

\ > ^ _ ; , 

. A file in which* each data record contains the same type of ihfbrmatibh-- 
such as a .student survey file sorted by student within school — -is not a hier- 
archicai file. Since each data record, contains the same type of information, 
this is simply a rectangular file in a particular sort sequiehciB. 




;/ For archiving purposes the major consideration is^ How can the files ; 
p*be organized in a manner vrtiich preserves maximum flexibility for the ^secondary 
analyst? We will address the related issues in turrit 

Hierarchical versus Rectangular Structjore ; HilraFchical and rectan^lar: • 

_ _ . _ _^ *c. _. __ _. _ __ ^ 

file structuring each have merits ♦ In chposing one or the other, ^tradeoffs 
are necessarily made. For data deflected at varying levels^ a hierarchical 
format is the most csbmpact and flexible for data storage and analysis. For 
instance, the solutions of ^analytic problems requijin|f the use of district- » 

apd individual^levet data are facilitated through a hierarchical data structure* 

— ■ _____ __,_ _.. ,t 

The rectangular format, however, is simplesr and' easier for ah analyst not 

ii * J / 

ihyolved in ah origihai study to use and miderstahd; In addition, the^most 

popular statistical analysis package is the Statistical Package for the Social • 

Sciences (SPSS)^ which can process only rectangular files. Tb analyze a 

hierarchical data set with SPSS, a prbgraMner would have to write a special 

prograi to manipulate any hierarchical data within the file. The development • 

^of spch a {urogram would 'be costly and -time-consuming. The OSIRIS statistical 

package i also widely used^ is similarlly unable tb handle hierarchical- files 

directly* The Statistical Analysis System (SAS), ^rtibse availabiility is 

much more limited, can handle hierarchical files but only in a rather obscure 

manner. The SAS manual does hot address the issue of hierarchical -fil^s 

'or give good examples bf its use with such files; The alternatives assbci;ated 

with this component are listed below, from the highest. level of effort and 

usefulness to the lowest; ^ ^ 

_ \. . ^ if 

i Structure the files hierarchically; develbp and provide 
prdgrams frtiich wbuld allow ^al^t to^nipuiate the 
data t2sihg popular statistical programs; 

• Structure the file rectangularly 
\' 

i Do not restructure; use the files is received from the 

contractor.. ^ . 



Merging of-Files Frequently ^ data collected oh the same level of . 
analysis but through different instruaent or techniques are originally or-? 



gahized as separate files. Within many larg^ projects, multiple files contain 
''the same level 'of data. For example ^ district-level data may have been coiiected 

_ _ _ _ • ; _ _ , :^ : ■ 

with three instruments. It is necessary to decide whether these separate files 
should be merged into one as part of th^ archiving process. The answer to ^ 
this question ist based bh aha?.ysis of two factors: ease of xfergiitg and 
biguity of documetttatioh. . ^ ^ ^ 

4 • , t . - - 
. _ ' ^ • ^ • _^ . 

|asi of merling refers to the level of difficulty an analyst would en- 
ebuhter in attempting to merge data sets» For example t an initial analysis 
• of school district-level data from a national survey ifidieated that a consistent 
district coding scheme was used for all four^ files. It was ftssim^d that merj^inR 
the^e files would be a relatively simple task. However, the student identificatibh 
rubbers in the student data files were changed f rcmi year to yea/ to reflect % 
changes in family structure or to identify students who left the schbbl district 
and later T^turned. Therefore, it uas actually quite difficult to merge 
the fHes. ^ ' ' • 

Sometimes, merging may preclude the use of a file for answering a specif ie 
research question. For this reason, we rec^mend that merges be done by 
secondary analysts working with the arehi'?ed files, if linkage ^risbles 
are clear and merging proeedures straightforward, 'in eases v*ere file merging 
is complicated by complex linkage ^riables of similar problems, we recbmmehd 
that merging be performed as part of the archiving process. * ' 

The second factor in deciding iSiether to merge files is ainblguity of *; * 

documentation- Certain data collection instruments contain the layouts 
fbr the resultant data records as part of the instruments" themselves. Thus, 



26 



ah analyst revlewlhg the data cbllectibh ihstrumeht can obtain the location 

of each variable within the data file directly. If the file were merged 

with others, the location information vrtthin the survey form would no longer 

reflect actual data records. Utilization of *the data set would then require 

- ■ _ _ _ _ _ ' 

another intenaediate step to translate the survey question number to an actual 

location %rtthin the file. Other instruments, however, include n6 iocati^nal 

information and, consequently, possibly confusing factors do not exist. 

In either case^ ah analyst would have to use' ah intermediate' cddebbdk to 

- J < __ " ' - ' 

discover an IDS item's location within the data file. 

File Structure; Level . Level of analysLs is- a structuring issue which 
can^ usually be easily resolved. Deqisibhs about level bf ahalysis have biily 
limited importance in the archiving prbceSs, since it is quite easy to trans*, - 
fbrm data records structured at one level to another level. For example, 
in the classroom observation example discussed abbve, if the student was 
chosen as the unit of ^analysis, . a very simple ^^gram could be written tb 
transform the file into an activity-level file. The program could be written 
in the familiar and readily usable SPSS format to further ease this problem. 

Since no flexibility is lost irtth either chbiee,files may, in- most cases, ' 

" ^ ' _ _[ ' * _■_ ^ 

be archived at the same level of analysis as they are received. 

d. Strategy: Stahdardizatibh/Recbding ^ 
Standardization OTd ^.pecqdlng are undertaken to resblv^ problems created 
'by dissimilar, treatment of missinfe. values', the use of alphabetic codes, in- 
consistent coding of linkage variables, the use of similar questions in different 
ihstrtamehts, and ebdes^'fcr respbnses tb bpeh-ehded questibhs. 

In many studies, data is collected by different cbntraetbrs, each att'emptihg 
tb undertake independent. substudies. Often, one result of. this joint effort" 



Is that rib consistent coding scheme is Used to prepare the collected data 
iir machihe-readabia form. Cohsequehtly, differing missing value codes may 
appear in each of the study's data sets. 

This I however I is riot the only problem related to standardization and 
re(^dihg: other potential problems, intentional and imintentibnal ^ can arise i 
For instrabe, the use of alphabetic codes (the letters "a" through "z") as 
data values Is an all too common and often problematic practice.^ In addition, 
linkage variables, such as state codes ^ ar*e coded inconsistently. Another 



concer^n in standardization is with similar questions asked in dif ferment ways 

' ju 

in different Surveys. Although the original coding scheme does not formally 



account for these similarities i they can be incorporated in the coding scheme 
used in archiving. /Finally, data bases often include items lAich were coded 

from open-ended v^estibns. it is necessar'y to decide iSether to Ellipse 

. ' • * "V. • ' ^ 

some of the infrequehtly used 'codes or* to delete certain data items completely* 
._ ___ __ . »_ __ _____ _ __ .._^ 

We will discuss each, of these five types of recoding activities in the riext 

few pages. * ' . ^ 

Missing Values . *^Miss^g values are also rarely standardized access 

_ _ __ _ _ _,_ _ : . _ _ • _ , 

files. This is especially true if the files wer^ prepared by different con- 
tractor's. We pr^bpbse a^'fener^ai appr^bach tb missing values: to institute 

c ■ ■ ^ / ^ r 

a consistent set of missing values throughout archive files. In addition, 

r ' _ _ . _ _ ' ^ _ _ - _ __ ' _ - 

we recommend the creation of a set of idehtical missing values for use with 



ail variables within the fijes of an archive project. 

.__ • _ .: _ . . 

This approach is not ^i|&.as simple as it first appears. It suggests ^ 
many alternatives! each of whidfe presents its iDwh pr'bblem, The initial in- 
ciination is to use some negative range of values tb represent missing values 

__ _ _ ' \ . . L ^ i _ : 

say, for example, «1, -E, -3, ' This presents a problem irtien a variable's 
value can legally "be negative^ for instance, in certain test scores or, monetary 



ambttnts. One might choose a different set of values— -very large or very" 
small numbers which are^miikely to be legal values * for Instance i "-999999." 
This choice presents a problem for ^^riables ^rtiose values otherwise occupy 

i- , • • 

bhlt 1 column: the file size is significantly increased. Another alternative 

- - - - • ic. - ' • 

XS> WO use; CLLX "^'O** Xli IsaCtl Uai#CX XJ»Cj»U G» Ull^ Ul^oo^klf^ ▼a.^viv* w* w u 

of mostly "9*s" for multiple missin| ^iues. Of course, this approach presents 
the problem of having different sets of missing values, depending on the 
. width of the data field. For example, sihgle-columh data fields might have . 

a missing valuer? "9f"^ t^^ this i_ 

approach would also cause problems if "9" were a legal value for a single- , 
column data field. . . " ; . 

Given the choices and limitattohs of each alternative, the best initial approach 
to coding missing values is using a negative number .range, since negative 
values are hot legal for most data items. In no case is a bla^ to-be used 
as a. missing value. Its use can lead to ambiguity in analysis because many 
systems^ cannot differentiate blanks from zeros. This causes severe problems _ 

when zerd^is a legal value. To Sake final deterainations, it is necessary 

_. / . . __ _ _ _ ' __ ^ _ _____ __ . 

to review the missing value cbdirig schemes in use within the data sets and 

the legal values for each of the qtaestibhif within the data files. 

Use of Alphabeti c Values . A troublesome but coiiaon practice in survey 
. research is to use alphabetic values ("a" through "z^*) as responses to questions. 
For example, a survey question has 11 valid respbhs^s. Instead of using 

two columns for this data, item, the values "1-9", •*B" are used, "ft" 

a- 

. represents ah answer bf "10" and "Bf" "11-" : This technique is gemerally 

used to save keypuriching time,a^d it .dbes reduce keypunching costs slightly. 

However, it impacts most analysis activities adversely, since most statistical 

packages cahhbt handle alphabetic responses easily. 



ERIC 



Files which contain alphabetic codes should be reeoded so that ail iegai 
data values in the data files id.ll W.niperic.' The one possible exdeptibh 
to this nonalphabetic rule nay be state idehtlflcktlcn dbdes irtilch use the 
two^character Post Office codes. Determinations on these state codes should 
be made bh' a c^e«-by-case basis. 

> y. _ . . . 

eoamon Questions across St udies ^. Retimes ^ the independent substudies 

• ... 

___ _ _ __ ____ * ^ 

of a larger substudy may collect similar data using dissimilar question 
and methods. One p6s| ibl li ty ■ f or ,facilititin£ the analysis of these data • 
as a group is to create a common coding scheme for the responses to sl^lar 
questions. We belleVe that this iK>sslbillty offers no direct advantage and 
presents a number of serious disadvantages, in most cases, these si^lar 
items were collected as parts of different data collect ion efforts ahdi there- 
fore, they are not absolutely equal. Establishing cbmmbh coding cbttld obscure 
their important differences and even convince an analyst that they are, in 
fact, identical precisely because am identical coding scheme appears in the 

^ 

files, bur recOTmendation for treating similar data it^ms is tb defer reebdihg 
to the analyst. ' , 

* CbllapsingJSodes^ . it is nb^ usually advisable .to collapse or anit a 
few codes, especially those post-coded for open-ehded data items. Cbllapsihg 
data values permanently obscures sane of the file differences in responses. 
These differences may seem minor or inconsequential, but it is impossible 
to forecast iSat analyses might be conducted with the data in the future. 
It is possible that ^^t seems ihcohsequehtial how will become important 
to' scwnebhe in the future. The only instance in ^ch we would advise collapsing 

codes is when the data shbws a difference that is inconsequential ,^or not truly 
representative. 



'3G 



Regarding the cnilsslbh of data items ^ we > make a similar r^cdimehdatioh: 
unless the lhGlusit)h of data itpms wduid mislead and confuse ah analyst or 
actually^ reflect Incorr^^ct or unreliable data , the items should be placed 
in the archive. *lJe prefer that the analyst make these declslbhs; Cs)he is 
in a far better posit Ibh to detemipe if the dat^itCT is valuable and relevant 
tb a particular ahaijrsis. • ; . 

y 

2. ARCHIVE DOCinmiTATION 

Developing dbcumehtatibh is the second major task in data archiving. 
The dbcraehtatibn that irtll accompany the archive data files is critical: 
the level amd quality of »dpcumentatlon yili have a greater influence on the 

future use of the data than any bth^r factor. It ±s^ therefore; essential 

_ ' . . ' 

that the documentation developed is complete, accurate, and easy to tise. 

Archive documentation differs substantially from the documentation usually 

■_ * __ __ ___ 

prepared to oaccompany data files. Since future users of the data archive 

will hot have the luxtiry of diretct cdhtact with the brtgihai data^ cblJectbrs 

'and data ahalysts, the arrive doctmentatlsOn is their only source of information. 

Therefore I this documentation must anticipate and answer questions that may 

---- ---- - ' ^ - - - ■ * 

bo asked about the data in the fiiture. 

Preparing ddcumentatlbn which meets these requirements demands a ^variety 

of skills not obviously associated with data^ archiving, such as ari inter- 

dlsclpilhary team combining^ the technical. skills bf programers and data 

smatlysts with piebfesslbhal writers, editors, and graphic designers. Archive 

^ i • ' ' _ - _ ^ -- _ -- 

documentation must not only be inclusive and accurate; it must be well written, 
easy to read and ecanprehend, and contain ^suai elements iftich help its readers 
tb focus oh iSiat is most important. Documents developed irf.th these goals 



31 



^in mind are not only more attractive~most people find them more inciting 
to use. * 

in the majority of data cbllectibh and analysis pro;jects, data ^bctmehtatipn 
is usually accorded a rather low priority. Generally, data documentation 
^ is not a project "deliverable.** WhenUt is, no standards exist or are sub- . 



sequently established for, its acbeptabiiv In most cases ^ this dbcumehtatibh 
consists of a recbrd layout shqwii^ where each data field appears on the 

■ . u : ^ ^ " 

data tape; in other cases, only a copy of the data collection instrument 

^ - - ^ f ' 

with column numbers is provided. IfeTonnation on collection methodologies ^ 



coding techniques^ and missing vailue treatment are usually not reported i 

Analysts who require this type^ of information SOTetimes attempt to piece 

it' together by loolcing at the data tape or trying to contact someone who 

' _____ • 

has worked with the data. The drawbacks of this limited type bf dbcumehtatibh 

- _ '_ ■ _ '_ j'_ i__ ^ 

are cjvident in the following exan^le. fhe'^Bureata of tabor Statistics z^equested 

that a tape containing information on local s^ea unemployment statistics 

- ■ - - ^ ' -_ - - , _:_--'!--_---_ - ■ - 

be reviewed. It was accc»npanied by a one-page "•document" which was supposed 

tb enable researchers to use the tape^ (Sees Pigiire 3^5 the most interesting 

aspect of this dbct^ent is that it is presented as user-level doctmentation 

for a data file, although j ' 

0 the record format indicated is incomplete. j ' 

b it^ts hot clear whether the^ state code is an ;aiphabetic or 
- " numeric code. * . 

• _ ■ _ __ ' i ■ ■ 

o 1;he data fields in colamns 36_through 152 do | ndt_ indicate what s 
the measiire is. Are these values percentages? Is there ah 
implied decimal pbiht in thbse numbers? j ^ 

_._ ■ __ _ ■ :_ i_ , ^_ ■_ _. 

Users would have had to loolc at a printout of the tape to try to answer 

-- ■ ' -- ■ ' - - ^ - - _ - - ' L - --. - * ^- - - ' 

these questions. All too often, this type of dooutsentation is considered 

* ■ ■ -/ > » 

-._■--. _^ _. • ^ / 

adequate by its dissemihatbrSi 



Because a study or project can Be conceptualized as a complex whole 
coMistihg of .multiple parts i the ddcumentatibh of studies must take into 
consideration that an archive user heeds to understand the study as a whole 
as well as be Jamiiiar with the components of the whole. Documentation • 
should provide a broad overarching description 'of a research .study, and it 
should also focus in on the components of the study. The first of these ; 
descriptions adapts a general perspective toward documentation and the 
second type of description utilizes a more specific perspective %dward 
documentation* We call the. documentation resulting* from a general perspect- 
ive ^"project-level documentation," and the more specific documentation ^ 

' __ 

"file-level dbcumehtatibn. " 

The next section first presents a figure showing the relationship between 
tlie two levels of . documentation and secondly. Briefly describes the levels. More 
detail is given iii Volumes III and IV of the Ar c h i ving -jlethodoiogy ; entitled 

"Prbject-tevel Dbcumentatibn Standard" and "File-Level Documentation Standard," 

i- - - - 

respectively. Here, we will discuss the aspects of dbcumehtation bf • most 
concern tb prbject officers. .* ^ ' » 

■ i . 



REUTIONSHIP BETWEEN PROJECREVEL ID FILE-M ilOClENTATii 



- / 



Prdjficfc-level Dbcuaientatlon 




File-level DocuientEtion 



PROJECT ' 

Mvii 




SUBSTUBt 








DESCRIPIKN 


— ^ 






* A single .project is likely to eiicpipass Ibte than diie 
is developeil for each substudy. If there is only one 
for a project overview ddcuiDerit. 



and lii such cases; a separate description 
, i.e,, one desip, then there is no need 



** Sttbstudies usually generate more than one file and each ieparSte. file should be docuiented if its data 
are ;judged worthy of archiving. ^ ' , ? , . 



» < 



ERIC 



34 



The prbjecfcplg^l documerit has three inajbr sections: project overview* 
substudy descriptions^ and appendices; - 

b The project bverviev suipiarizes the most important facts about the - 
project and its historical significance. 

o The substudy descriptions include sections on substudy purposes ^ 

____ i_ ___ _ _ . ... 

findings i samples ^ and iiif oriLatibh bri the contents bf the data sets 

in each substudy. ' • * 

e 

d The appendic ^s^ include a' crbss-^ref erence guide, bibiibRraphy^ and bther 
materials related to the overall project: and its substudies. 

Projgqt -Qvervt'ev > The first segtion of the prbject-ievel document is an 
overview of 'the important f act« of the study gnd is designed to give the reader 
a thorough grasp bf its substance and evolution. These fact^ are brgatiized 
thematicaily; ^ • \_ , 

o Abstract 

-b Background - histbrical perspective arid significance » issues addressed 

resulting in the undertaking of the study 
o Research topics investigated 

The intent bf the project bverview is to cbtivey clearly and linmediately 
the important elements bf the study,' how these elements were cbhceivedi and 

'how they articulate with each other. Thu8» the overview not only jdescribes 

------ - - ' 

the study's histbrical and thebretical babkgrbUnd and the topics it investigated, 

but alsb clarifies the overall coherence of these el«nentS| i.e.^ how they 

logically flowed together to fonn the study. 

The length of the pro'ject ovexnriew varies with the^cotaple^ity and scope 

of the study, 'it is recommended that descriptions bf complex Studies requiring 

a lengthy overview ^plby subtitles for organization and emphasis. (See example 

below;) ^ -f 



After the project as a whole has b.een descrxbeul^ a 
brief bvei^iew of the relatibrfship between the substudies comprising the project 

_ _ _ ■ _ _ ' ■■■■l" ' ' ■■• 

.is presented, the goal of this section is- to inform tfie reader of how these 

substudies come together within the archive • Before this goal can be ach^feved^ 

the 'archivist must first organize the project into subsi:udies. Each substudy 

• . _ _ . _ . 

is then d-^tailed. Its description consist^ of the following cbnipohehts : 

o title, , . • • 

b backgrbtinci and pjit-pbse, , • ' 

6 study design^ 
> 

o SBxeplBf - 

b statistical analysis, • " Jv^ 

b major findings ^ ' n 
o file descriptions; 

i 

The file descriptibti is a key section in the substudy description. It , 
gives a brief outline bf each data file within a Sxx^study aifl informs readers 

of the iypBy scope; and scale of data in each file. Eacfi file description^ 

- - i . /* 

tells abbut • 

b thcptype of data in the file^ alerting the reader tb unexpected 
data and highlighting Important or unusual contents fe.g.^ "This^^ 
file is the only known source of nationaily-weighted data oh 
*^rioi§nce in schooljS broken down hy location within school") ; 

o th^ data 'collection instrumenj: used to create the file; 

o thei:5^be5^f data items per subject; 

o the- number of subjectSi . : 

Appendices > A key feature bf tKe appendices is^ the Cross-Reference 

Guide. This guide^ or index enables a researcher "tb identify information 

( 

cbilected through a number of different activities. A researcher interested 

__ ___ _ _ ._ . ________ 

in analyzing reading instruction practices > for insitance* could utilize 
* * 

this guide to identify which questions fbcussed bh thifi issue and in which 
data files they are Ibcated, _ ' ' ' 

• . 37 . • / ' 




Because of the large number of data items which make up a study, the creation 

_ _ . . . . _ _ _ - - . ■ _ _ _ _ ^/ 

of this guide is both cbinp lex arid time-cbrisuihiTig^'' It dan be approached in two ways. 



the simpler approach is to develop a Key Word in Context (KWiC) list which indexes 
each word within each question. (See' Figure 4.) Aithbugh this is the most 
ihexperisive metljbd bf creating an ihdexrdlts utility is limited. The teriniriblbgy 

utilized in each set of data may differ; certain conceptual ideas may not be 

_______ _ _ __ _ ___ * 

directly described iri any individual data item description. >A cbriceptual index 

^ \: . . . r ^ ^ • - . 

is more complex and also more valuable, in a conceptual index, key conceptual 
ideas within the study are identified and then used to' index each data item. 

(See Figure 5-5 ' , ' 4 

. * >^ 

Also included in the appendices are bibliographies for the overall project and 

for the iridividual substudie^ The appendices can alsb cbritairi excerpts bf related" 

studies, reports, and other ma^rials reievstnt to the study. ^ 

File-Lev el^j)ociimeri ta t ibri . Dbcumerits cbmprlsihg the file-level documentation 

_ \ I ^ ' / 

prcJvide detailed descriptions of each data file contained. in the archive. These 
file^level volumes are individual documents perta^.nirig to each data file. They 

insist of two parts: a narrative description of the file and a computer-generated 

- - - -./- I ' • . ■ . 

codeboolc. 

The harrati^7e f iie'descriptions contain information bri the gbal bf the 

specific data file, the unit of bbservatibri^ arid the data's scope and scale. 

Thp codebddks 'describe iach data fields its location, missing values, and coding 

scheme* and provide specific notes on the field. For high-yield data set, we 

reconnnend that a Software system be used to facilitate, the creation bf codeb^obks 

'to allow detailed, machine--readable cbdebooks to be created efficiently. ^ 
_ _ _ * 
A riumber bf factbts suggest that a machine-readable format be used for ~^ 

._ __ _ _ ._ . _ _ 4 _ 

codebobks. Many researchers have indicated that it is frustrating* to attempt to 

. - - - -- _ _ _ - ^ Ik ' 

read "f if th-generatibri Xerox'* cbdebbok copies. By creating cbdebooks as cbi^uter 

files and including them as part of the archive tapes, each researcher is able 

, _ _ _ ^_ _ • . • 

to obtain asjnaany firr^t-gener^tion copies of codebboki as required. Iri additibri* . 

tg * - - ' - - 

CD 1/^"'®^^^®^ cbdebook may be prbcessed bv cdfcputer, a reseatcVier cari u*?e a coni^uf^r » , 



FiGURE k 



• Key 'M/'brds arid Phrases '. ■ ' 

Communlry chsnie, luiiofa! center J .. , .* 
Community council, irifbfrnaii6h * . 
Corny niiy development 345 1592 
Community development, inner xity.iif/ 
CommunTiy developcnent, lamaJca 7t5- 
&}mmonity development $eK4jeJp 7118 . : 
Communiiy development, youth $1$ - s 

• Gwnmunity education, ecology 281 
Community fond, CA J2 : ' : 

- CbtwnwMty fond, CT 129 - .* " -v- i" : , 
- • Community fund, H\ 2S4 29S SIS S24 :* 

Commoniiy fond, MA • - • ; 
Cnmminlty fofKJ, Ml / I * ; .: " . . 

' Cbnunuhlty fond, MN 74/ Z J V * ; ; * JrV " 
. - - Gwimuhiiy fond, NY 7JQ7. y-j^ . I:, -j:^ . * . 
. . • • Cbaimuratyfund. OH fSS9 . - *-^ * ^c: • 

. Community fond, VA 1629 v;-rVr'- ' 

*r . - Commonity fond, WA 69S : V - » • 1' ^U^T* - 
f.:-^-:; Cwnmwihy fondi, lesdenHIp ixamihg • Z 'jr : 
: * QxiKniMty ^ices ^ : ^ __ .r' . 
.l7^:^03mm^Y_itr\nc^, resource center i^^ 
:/ &fnputef eqinpmcol, ccJJege J • ^ 

CompmerElfW •-V^:j:' ;- 

"^^'{•Cbfnputef purchase^ scboo{ ^ _ _~ • - ; - 
ti- Computer system, small Jto^lals jWT' 1' 

■*" t':. Cqmfwtcnzed t!&vof, txpwti^^ J289 
•^".^Cwnptfterized sy^an^hyma^ 
•\ Com^uterizedsyrtem^re^^ 

■ -t^ ConcertSr handjCTjppcd chiUfEi J394^ 
-yCi»suffl» cre^^ 

- CcNttumer e<bot 

- C * Comuihw prpteoibh, SbarushHspeatong 140 - : 

' C bnwm cf services, puvk^X)on J26d . - 
. 'i -;C6«Teaion$, community associa&vi. 

, .Cofrealons, iommunity wfa^ 7JJ . ;* '.^ :>*>'_ 
. Cc^ Kio, dembgrap^^ '^V/"^ 

• :; Costa Rca, graduate soocSogy /iW**.^*./^^^^^^^ 
.'**- Co«a Wo, rural _^th_^^ 

- Court (sypenor) child care 197 : v* ' • 
: Crafts, center 14S6 . ..V^i . • 

■ Crafts^owmufBty pfogr»n ^15.; 
Oafb, prHohfr rehabnitatlbh 219 /- * 
Cnme, ay>sultihg agto^ 
•Ql'?'^ raj)e^victim$ 4i7 * -.— . ** / • . . 

' CKnMs, street safety^f5^ 

Oiminaj jufiice, aged_///V T vi ; 

• Dim'tnal justice, city agencies-^- * • 
" Criminal jtmio^ CPmmimity 7^32> . 

* - CHmihal justice, ^<^|id_W • ; ^' 
^ ' i^nirinsAiv^^^ 1097 J ' 
Criminal justly public information '1494 

- "jr justice* reform f19 _ _* -zH . •! 
^j:vCrin»nal josticci state system ' r 
-V:* Coligralarcs^cBstcf 1484 VyS-^y 
"••^i. Cultural center {Ut'in Amer^K) J/ 
;. At Ctihural certter, HawaJiai) hentagc-J^^ .* 
' '"••Cul|yralevwtt,_ln$^^ 45?^: 



firiiion i7/ 



Pul Pew^_StsiicmJ89 _ 
. Deaf^ residential program 6S4._ 
Deaf, tdetypevwiter systeit 143o • 
Deaf, iliei^t l«rmng 7/7 
Deaf, visual alarm system 1J87 • 

• rainquer^ goyenUeJ 

• Delin<iuencY<' pfwri'Son 249 701 81 j ISSO 

. m2 1S71 . 

. Delinquent youth, rchabniiation 141 216 928 
D^Kjuem you*, ^odl /5:9 _ - • 
. DeBmjuwt youth, %va^^ 
DeltfU)u«i(s puvenHd . . .* ^ 

Dgtfiquents Buvenlle) custody 269 . 
. Pem^phy, Q>su Rica 11C3 

• Dental atndllarie, private practjce 4S7 4S2 
Q&i^mt3JS ' '-'.-^ ..i • 

vOci^dirtvchi^fen;^ ' :* . 
: Dental ore, ham^aipped 7jS0 81/ ^ ; 

Cental care, «tuiily 457 4SS 4S8 : r* 
::Pefto!qLrc,.schoddttHr^ 144^^ 
^ Deout facility^ Amoricsi indlafs 777 • . 
' DHrodh^h^-SaaestwJy^i . - 
■7:::*: Demal pradice tgrbopl ^/ . 

.I^t^jdiodlro)^ > V 

r,. prntxTstu^mtSr Motion /¥7? - - 
tteifeiry bin^QuaSiy review ^ 
Xn* Deniifiiyi Colombia ^fi/^. t-- • r . : .i: . 
•Denfe&y,$is^teait^;W . 
•Develop ONm^ws, C^^ '4 
-Dievdc^ OKpno^ dten«by_(^erence_ lOOQ 

XKabctesasodiyoo : . f ; 

Diabetes cdocaiion-i£5$ - ' - : ^ . * - - . / :V 
Diibetsrtse^ r72^r -' * : • - 
CH^bnary; D^herk^b«c^r^^ 12^65 ' " ' " - 
1. Disturb«Jboys,schcN^^4r/ - ' 

. ■ DttUirbed daldren, home J94 ^ 
' Pfebed dMm sdrool __ ... ' 

• Difiurbed childreri, services ^ 97 _ . ^: 
Disturbed c^ldrei^ summer program 7J/ • 

• • DBsifbed dfildren, Jca^^ 
' Disttirbed^ school \ " 

Kturbed youth, cr^lnteryentk)n 7^ 
" PttUirW youth, sdiool /^^ 
- Pttturbed you^ spKial education 7^ 
Disturbed youthr treatment cotter 7727 ' 
Drana (Amet^pd preid^kKi 777/ . * ' 

Drama,^ ihtt^MMlM re^^ 
.DhJg alxise,j)rmrak>n_7 1498 
Dnjg addict w <m|jbyces 396 
. Dnjg addidDQt n»(»rch 47 ___ •; 

Drug addicts^specy eduaitkin 752) ><: 
V Drog ditSc 7«r Vt.- ■ • . ' ' • • *: -r . 
..DrtigcfetMidttartibn • . . 
•C^ rehat^t§3rvc^^ 
;: . Driig'rdubili&isiorv niusk trairiir^ ISSO 




CarthLsdeoceSk scMarsl^ps WS' 
EastAB|, hisiiisii?;gH8lsues ///^ 



"TTr institute JS8 - ^- .'^:iY<^i::yj^^^:^ Eo^ (marine^ ift?/ 



' -.•r'Qjltwal relations, Europe S94 -.IV. 
Tci9?^^ youth >?7ij:; r^li-^^^^ 

- '">DarKe wnpanies, »^^d»n(i lji9 .i%:2v' 
.-. "'ri fence company 109S jOH^n34 1f4L i::/^ ! 
::.>C^e company fHispanic^AmsiCDd 1J70 i^'^:.'- 
-T-r Owe company; studio 38-^':: :- :^ ];tV ^ 

Dme program, school 11S8 • • -.v; :^*vI^i;r•?;. 
; : ObKe theater, omversJty 7«? .* J^ilr^-U^^K^" 
• Day camp, d^renJ5;<^ _ • v./'-* : * 
r rCbr care Utaie sponsored) : 
i:^carr, m6 l46 m .. ; " V.V 

Day cares diildrw«7W ZZJ^:- >>; rf. C^::' 
Dayare^COundlSGfraTff \ . f^^y;-^- 
^- Oly care, Morboo /7# - . ^ 'Ir 
• ' care. Nicaragitt^ .* /r.V.'-^' 

- :• fey cire._Spajwh n<^hboi^^ 412 : :* 1 

* feaf, akgi^ism study 1S09 „ : ■ .r ''h ; 
; Deaf, children 180 •vi 'ir -i-t^ 

• Deaf, clinical^jd^iiieter 77 * • / ,v - * . 
-DeaiCcoimserinlager^ : -^^^ '''^ 

featcoumelihgcwierj^ .v .v^ *^ ^ 

Deaf, hearifig^ te^ equipment 1S84 , i *. . .. .. .* ■:\ 

Deaf, job counseling 77 / 
u ' ». . . **'..'**■ ~ *". ^ * * 



Ecology, c oasta l xone study 7/0? • . . ' 

^ Economic development, dty S36 S83 
.Econon»cdevelc^>m^coimty 7S1J - 
: EcDnamelbidopmeni; r%s : - 

ficcmomic fKfey, Clwle ^ 
*&P"pn*«w<h^ dwc theaier 
■ tonomcsii^ hMbto RicanJ;7/5^ 
AEconamkjtody; youth tmempbytnent 1290 

Econornks, Ameri^ tmfiahs 
fiemmics, cimtculum 4J4 
Ccbriqmlc^ CDmm^]^ 
fopnpff^ corferew^^ y 
Ecpnomtcs, credit ui^ 163 
Economics, deferred giwt)g Ti^/ : 
Economia, Eccatdor //^. __ _ _ ' 
£c^)^fMCi, educaUbn 
Ecbnornic^ fimily cbmenng_^4¥ 
Ewomlq. firtSHnoums f ij> 
Icongmio, fmitfinee sector jJQi 
Economics^ minority Pii;;D;s/a^- 
Ecooomics^ minoriry stodents 1301- 
Economls, PilostSh university 1163 



4S0 



Economics, lelewTon course S 
Itiiadoir, hiirhah righls 107$ ' 
UMi6o[^ tnSajLimmunizatioo 211 
|cwdpr,_untyeniiy //5^ : • .* 

Edocaikin CadoliJ 

idodiion (idu!0 Boia 248 
Eduuttori (adiiU} 5/5 
Educainan Cad^lJ sehiwr ^ - 
Education (ahcmatcj adolescents /j^^^ * . ' 
,|ducatipn&tnguaD_Chicanos^ 
Education (childhood study //fit? .-. V 
Educaliw Ccwfm^jity) . -'jl - 
Educaiiori (omvmm^ /: 
Educatiqr) foyiumnuty) assodaijpn 555 
Education Icdmmut^) ceniff -#/^ •:.* 

*wGtipH (CuniuiMiHiy) tsS^f^Stl . 

EdusUon (cornmunit)^ facility 57^-- > 
. Education fcomitasttty} interwews S26 
goatk»l ta)mrriuruty) parb £ rec^ 
. EduQtibrt (commuriityS worIc$h^577 557" - 
■ tortinwn^ SjKijdwgy 7^ . 

^JujaUon toonimyin^ ccooomia 5 ' . j - -j-. 
; Edration (cominuin^ hcaJA cate p&somd 728 - ^- "^if 
' CduQtion tcontinumg) gudy 71/ : : ' "^^ ^Sf 

EdoQ^oo Ccarly) matter^ 7775 - . . ; " - . V 

• Eduatibh fexpwienUaO Au$^^ . -'v' "v-.V^tT^ 
fducaifcn lea^wwn^^ .Vr : . : crs-^ 

:, Educat«nP^hcr)co«ofl«im^^.^ ^ 
. Edoation fiugher) dsegregation 1123 . i^- 
: Educaticm Oi^he} faculry^devtb^Kne^ 4S4 
id^don (Kghe^ htima^^ 1081 

• Cduca6oh0u|h&)mtfKK^/^ '^^^ • 
£ducatk»r> D^her) teach^ 1074 
^uaiion (hi^ier) valuCL 7^9 _ ' • : 

; £du<?iionfopenImstftute £22^ • • : * i?. 
.. Education (pDjfcscrjrfce) 6S8 •> 
EdpQtion ^1^) dtifen commitiee 136 '■ 

• iduQtjori (pubTtd aty board /577 i : " 
: Eduatior) CpuUtd coBedive bargaining /77f 

Edtotjon {secondary} poor students 77i7 ' 

Sucation (speclali 812 • ' 

Education fopecia!) disturbed youth 7;7. ' . 
Education f^edal) drug ^tdcHcB 752^ 
Edu^tion^bpeciaU learning disabled nS6 I 
Eduatibnal adrnirustatidn, Uicb 77^ -J, . 
EdDcation^l awards ^iting S28 . - • ^ 
Educatxmal cent er /554^ _ - • .v -^ 

iducational committee; citizens 1419-^^S/^'^ 
lducai»ooaIcouocn,^pofeyani»y^ 1073 / 
Edoational fmf /^ ^ . • > - 

Bi«auor>al fiind, international issues 229- \'\ 
rEducttional institutbn ^77 * h ^'.V-TggS^ 
/Edoofenal pdfcyi^siudy //flf ^ . ^ v>i-^.r^W|;VJ^?g 
-iduational research 55* - : > i^-:- > V r^^.^^^iSS 
/Educational roearch, adult srudenti^ . '■ ^ > •:>C'C'V^3fe 
Eriocatjonal re^»rfv ArgwiSna 11T0 . " -vr ; :ip*>^^^^ 
. Educational leseardy cbundl /57y • • ' Vv^ 
- Eduational rwearch, farmlyj^ 1417 ^ * • - i^n^rw^J*^, 
Edi::.<^ roearcS, pupD cleslBcatlon :r-•:'-•::.>.^^4^^>^ 
- ::v :;|dua^onal res^ardt, school councils 5*^ r ; 
, V^:r Educational seminar,-Urgguay/0S7 ' ^. " 
*v - , EduC2donalsteif7ff; : .r .— 
.^V EduQtfefial audy, ttudent cofiege dicSce ?// 
/ Egypt English Sar^uage leadungVO^^ li; 
:■. Ifyptr popyUtfen siudy W7 
. • Egypt university 6S1 
• Eg)^ water C}u 

Em^nts, Utin 

• 7 : : &nignnts, Soviet scholars 1072 • . ' /.i^^r^^lTi : 
:. Employees rdrug addicted) 7^? ' <^.< ^y^S?!^ . 

ImfJteyinwt Squalb^ • • - . '^^l 
^ * Emjapymem^dfe 8S4 ' - ' -^^^^^ 

Ifnplpyment hanificapped ^/ - 
Employment mentatly^b^ed 1184 . . 
. . Employment *vomen^57? ' : • '^y''^.<:\<i^^^ 

'.y Momnm^O^^ f^iy 1317 . - "^^-^^^^^^ic- 

: EffdowmMt uriiversity fond . ' 

Ewgy &$r) 5^ /5ft? _ : , • . ' 
Energy companies, coal fasifiation 849 8S8 





Energy companies, coal faslfiation 8^ 858 
Energy companies, mining operations 85^ 
Energy companies, $tripmir)ing^<7/ • v . ^ 
Energy comeriitl^ study 1101 
Energy dev^^ent oWonment 775:^ . 
Enerfy manajme^^ Rtur«_^i 
Energy resources, management 7f 



39 



J* 



Ki4ritifBC indusir5cs-Conu«ue«f 
-Cfrfak pracuccs/invcsUfs^jipn of. 33.001 

"set cUo Fisheries nidiistr)'; Kavi|ab!t waters a>i; Sea trahsportaiibtK 

JShipbiiildml 
Mariuihe war rifle insuninee, 11:50^ 
Market ihfdrniarion i|neuitufai, }6:353. 10.156 
MaemU_ 

chemical and phyucal properties, data. 1 1.603 
jbans. 24.011 
research. 47.047 
ttatidard reference;. n.$04 

wid mepares. 3 1;666 
Materrtal and child health 
Appalachia, 23.004. 23.013 
ciM >>«ihh research grants pTogmm. 1 34? L _ _ 
dttld welfaTc and development research. 13.608 
~fainDy plaiinihg, Family pl£inihg 
h»Hh services. 13:232 • 
Indians, jtet Indian heahh 

tnatmi^ and iniTant care pro>fects. iOJ57. 13.234 

mental health, children's services. 13.259 

nrniuliy retarcSed children. 1 3.232 

sudden thfant d^th iyndrome. 13.292 

fraunms health personnel. 13:233 ^ 

sew Child health; Child wel&re 
MCH. 13.232 * 
McIntire>Stcnnis Act. I0i2d2 
Measks 

hibella control. 13.224. 13J^6g 

Site ctso Commonicable diseases 
Mot and poultry j 

bi^ction. 10.026. 10.027.59.027 

marketing agreemehu^ * 

iasifair business practice* 10.800 

ief a2so tjvestock indostiy 
M^ and poultry inspection state ^programs. 10.026 q 
Medioid. 13.714 n 
Medical «jucaticm_ 

allied health prbfessic^ 64.003 

biomedi^l remrch. 13^3^^ 

• «icer.sfr Mecficai research 
*cih)iMi] mearch cent^n. 13.333 
' dihical traihixig. 64.003 

dctitistry. 44f« Dentil «du»^^ . ^ 

^dlHie constroction. xrr Health facilities construction 
Smily medicine tricing, ll^^ 

fo»r.cial assistjmce. ue Health manpower itudeht iswtahce 
gehe.-al rhedical sciences, special projects. 13.383: 

Biomedical Raearch. 64.001 

bfomedical research suppon grants. 13.337 

hsalth profwiom. capltadon grantt. 13.339, 13.386 

health professions, impmvemcht grints. 13.339 

tmfth proleu^n^i^e^^^ 

ittmority s^crh. biomcdicai Sttj^pOTt, 13.37^ 

mu>w>al raearch service awards. 13.2S2 . 

IM^ ^liobl usasti»c« 

• hiiiitng. ice Nursing 

optometry. 13.339. 13:342. 13.378, 13,381, 13.383 
osteopathy. i3.339. 13,342. 13.378. 13.381 13.383. 13.384. 64.003 
phaiiiia^. 13439, j3m jtm j3.3^ 
ltddiatry. 13.339, I3J42. 13,378, 13.381. 13.383 
rochiitmeht ordisadv»ita|ed ^^^nj^* 13:380 
^ident SB^tance. Jfe Health manpower 
~ ^^ettfm h^ltais. health trai^^^^ ^ • 

I veterinary mcdfcihe. w 
Mr <0lfd Allied heiHh profmidni; Heilth proressiot|,i; Public health 
l^oitldn and tribting 



ERIC 



II 



Medical education and.tninmj^^ . . / y ^3^^ 

Medical facaitici, s^e Health ladliiies cdn$truct55n; tsboratariei 
Medical libxanes _ . ^ 

biomedical communjcaiJpns_r«carch grants^ 13J5I ^ 
biomedical inform atibh, 13.349 . • 

library rc»urc» grants^ * Y 

medicd librsy science, research, 13.351 ^ 
pubiicaiiorissupppn grants, 13.349 

regional medical libiarics, J3 J50 * * 

special scientific project granuM 3352 
Medical research < 
^iniu 13.636. 13.866 

j^ergic and irom«j»fegic d^eoK, 13 J55 . , . • 

Appalachian 202 health demwwuraibn^ 23.004 - . 

irthnttt« bone and skin disciscs. 1M46 ^ 

bacienal and fungal doeas^, 13^856 

biologic mfbrmatlon j^dling research. 13.87^ 

biomedici. 13375. 11.836 __ 

bibinedical ehgineerihg. 13.S6p 

biomedical science, 64:001 ' ' . 

blood diseases and ropurces. 13.839 

cancer, 13394. 13395 

olhcer bib^gy. 1 3^396 

cancer cause and pre%'eation research. 13393 

cancer centers suppm, 13397 ^ ; ' 

aincer control. 13399 - . 

cataract, 13.869 • * : 

cellular ar^ inotecubr bas^ of dts^ 

chemical information han(£ltng research, 1 3:877 

chad Iscalds, 1 3^65 ^ 

clinical and ^hystological sciences. 13.861 

clinical centers. 13333 * 

communicable diseases, j^e Communicable diseases ^ / ^ ^ 
communicative disorders. 13.851 " 
corneal diseases, 13.868 

dentistry. Mr Denal research / 
diabetes. cnSocrinoloiy and mm ^ 

digestive diseases and hutritioh. 13.84^ *• 's ^ ■ 

envii^nmental be*'^^!*"^?! ^ 3:87:? _ _ 

environm 2ntal muugenesis imd reJ_rod|icti*'c toxicology. 13.873 

environmental pathogenesis. 13.876 ^ ^ 

en\-ironfnental pliarmieolOf>' and ioxtcolog>', 13.875 
etiology of environmental dise^^ and dU^rders. 13.874 
fundaineritaljieurosciehces. 13.854 
fcnetic^ 13:862 - 

glaucoma. 13.870 

heim and viKular diKisn, 13^^ 

liemaidloi;^., 13.850 ^ ^ 
kidney-diseases. 13.849 ' & 
- laboratory animals* see l^P^^ry ^^^^ 
llbnifiei, publications iopport, 13.349 
library science. 1 S.35 1 
l&ng diseases. 13-838 ' 

menal health, iet Mental health research : 

neurological disorders. 13. 852___ 

phairmac»lbg>'*tbxic^^ 

population rmarcli. 13.864 . 

retinal and choroidal diseas^^ 1 3 .867 

aentbo^motor d»^^^ 

spec^lil research resources. 13.37 _._ : • 

stroke, nervptts system trauma. 1 3*B53 ,^ 

Vetei^ahs AdminStfilwn (V^ 

jee also Allied health professions: Bk>iogical and medical sciences 
Medical resources, shared. 64j0j8 • 
Medical schools, ^ 
Medical services, delivery-, w Health lervicef 

Medicare. 1 3.800. 1 3.801 . f ' 

Medicine ^ V 

ftmliy^l3 379 . ' ^ ' • ^ . 

veterans, 64.012 . # ; 

. ' ,■ , ? 

.40 . , \ . ^^\m 



text editor to reformat the cbdebbbk file into a fepecif icatiph file for a 
particular statistical system^ such as SPSSi Thus; machine-readable codebdoks 
facilitate the use of the archived data files through standard statistical 
analysis systems. 



S 



4i 



erJc 



D. fiAtft AROTIVING ADVISORY Cb>^ • ^ 

Deciding which data sets to archives ana_wh^t level of e^chiving effort 
a data set merits are hot easy ta^ksi ^^^s due to the fact 'that the cteatibh 
of an archive demands input from a vari^y <^ Sources. Hbrebver ;the types of 
users of a data archive must be anticip«ed at the earliest stages of ai^hive 
development: their heeds, interests, and professional expertise. The best 
environment in which to make archive decisions is one which represents the 
multitude of dis^cipliiies that converge in the entire da|a archiving process 
from its development to utiiization. . 

. A conaSittee is the most natural mode in which to obtain the heeded input 
for these d^cisi^s. Fdu^^bssible strategies for creating a Data to-chlying 
Advisory Committee (DAAG) are presented ^beiow. ' ^ , 

1. Standing In-House Committee 

the standing' in-hbuse Data Archiving Advisory Cbnmrtttee is a permanent 
advisory^ panel within a federal ^ency which makes archiving decisions. This 

panel establishes general pblicies fbr data archiving^ reviews all awards for 

• * - ' ■ 

data collection, and makes initial and final judgements ^Vbbut whether the d^ita 
is worth archiving. The advantage of a permanent DAAC within the agency is that 
it assures continuity in experience. A second advantage is that the same >people 
whb decide that a g^en^dat^-iHCl^ is ^ archiving also have the authbrity to 

oversee the archiving effbrt>;and to release tl}e data to publics * » 

\ ■- : ' ' ' , • _ _• _ __ ■ ■■ 

An in*hbuse DAAC aS^sbffias disadvantages. Since a 'relatively small number of 
federal agencies currently archive their data and more archiving is done by 
ac^emic and private agenj:ies/^i^ may be currently uneqtiip^^ to 

make archiving decisions. A secbhd related disadvantage is that an inhduse 
DAAC may not assure sufficient inpc^t from* researchers ^o will actually be the ^ 
user^ of the data* A final disadvantage is other demands cm staff time may 
preclude full participatic^^ especially in agencies that are understaffed. 



EKLC 



2. Ad Hoc Inhouse Committe e 

A second type of inhouse a'dvisory panel would one convened for a specific 

archive- b-r topical area and draw its members on the basis of - their specific. 

t. ^ 

expertise. For example, in the case of the National Science Fbuhdatloh^ inhouse 
tepreeentatives of each division would review all projects within specified topical 
areas. For example^ ihhdlise staff concerned with domestic environmentai policy 
would review all contracts awarded in that area to determine which are appropriate 
for archiving. All projects related to stratospheric conditions would be reviewed 
by ihhbus^ experts in that area, Tfiese co^ittees would be temporary; thGy would 
be established and later disbanded, according tb the allocation of research dollars 
The advantage of this model is that it will assure expertise will in decision 

_____ 5 

making. But what is gained from the expertise of the temporary cdidmittees may 
be somewhat diminished as a result of the lack bf cbntinuity between committees. 

In addition, subject area specialist may not necessarily be versed in the tecftnical 

__ _ •- -__ - _ - " ^ j 

-aspects bf archiving, such as f-i-le"-ar^h± teeter 

documentation. : ^ ' 

3- Standing Ext ramural Committee 

The third type bf panel is a standing co^ittee of outside experts in the 

tbpical area, data base^ management , and public policy* Consisting bf a permanent 

_ _ _ _ ._ *. ,. 

contigent of archiving experts arid specialists in the specific research area, 

this review board cbuld bbth make archiving decisions in a continuous and expert 

fashion. Like "tn^nhouse standing committee, the standing extramural committee 

has the advantages continuity and accumulation of experience. However, 

it lacks the authority tb bversee data cbllectlQn and the archiving process 

and to release data. In addition, its access to contractors and project bfficers 

is more limited; 

M Hbc Ext raiaural. Committee - - 
^ . .The last type of :diata archiving advisory cdiimit tees if the ad. hbc extramutal. 



committee, coinposed of . distinguished academics and other data policy experts. . 
This type of cdmniittee shares advantages and disadvantages of the inhouse ad hoc 
comittee. Because it is formed oh the basis of topical expertise, the dcpnittee 
is in a good position to determine what is worth archivihg. Because the committee 



Is temporary, it lacks continuity of .data archiving exper^nce and expertise. 
Moreover^ the ad hoc extramural committee does not have the opportunity for access 
to project officers that inhouse temporary panel does. Nevertheless * the 

c 

extramural panel may have a better understanding of the need! of interested publics. 

• .. * 

Each variant of Data Archiving Advisory Committea has its, own unique 

,i ♦ - 

advantages as can be seen in the Rating of their BAAG Variants below (Table 1). 
The standing iiOibuse DAAC is strongest in continuity, in accumulated data 
archiving experience, and in continual cbhtact with project officers. Its weaknesses 
lie in its possible lack of eacpertise in data file architecture and tb^cal^ expertise. 
TOe ad hoc inhbtlse DAAC has strength in topic choice and contact' with project _ 
officers but lacks cbntihuity and the accumulation of experience- The standing 
extramural DAAe is strongest in both cbntihuity ahd eixpertise. A possible • 
weakeness is th^t it. may lack contact with project officers. The ad hoc 
extramural DAAC is strongest in expertise and weakest in the area of cohtuihtiy. 



ERIC 



In additidh^ the tools riiscessary to develop archive d^a yiles and laachihe- 

. ' . " . ' • ■ « I, 

readal5ie codebooks and the skills necessary to describe their effective use are 

tnbre likely to be found in an organization specializing iii archiving than Iti 

, ■ ■ ' ' ' ^ - _ _ _ - .- •• - - * - — ~ - - - . • 

a research firm whose major purpose is research and evaluatidh. In cbmblhatioh^- 

the speciaifeed functions of data collection/analysis cQijtractbrs and archive 

contractors encourage the optimal' ^se of the resources needed to create an 

* ' ' _ , ■ . ^ ■ * 

archive data set. . • * . • 

' ^ - _ __ _ _ ' 

From the project officer's viewpoint, a variety of actions' are necessary 

to perform the archive activities of. -Stage, Three: • 



A. Data Collectiqh and Analysis Contract Hbdif icatibhs; 

B. Archive Contract; 

C. Review of Archive Deliverables. 



_ . ■ ■ 

A. DATA GOttEeTiON i^ffi MALYSIS GONTRACT MODIFICATIONS 

- 

To insure, that collected study data caii be archived in the future by the 

archive cbhtractbr/ the data cbllectidn and analysis contract should include 

ti . 

tfite "Guide for Data Collection Contractbrs ' " Vblume 11 of this Archiving MethodoXbgy 
— - - - ^ ■ - -- ^ ■ ^ ^ ' 

This guide presents four versions of a "Cross-Reference 

irifbrmatibn Form" (CRIF) and recommends the contractor^ use CRIF'to identify 
key ikfbrmatioh abbut the prbject and its data to the archivist. The collection 
contract' should also specify the format of the final data files accbrding to 
procedures ^outlined in the ''Data transfer Guide" presented eisewhers in this i * , 
report arid prbvide for ,a small amount of funding for cotisultatiori between^ the 

archiver and the collector. As the C6ritractbr*Sr Guide states ^ khe additional 

_ » ■ . 

work requirements imposed on the contractor in archiving* are few. in most * 

ii 

cases 9 .tb*3 coritractor simply tjirns over information and data that is' normally 
produced during the analysis activities. .* • 

B. ARCHIVE CONTRACT * ' ^ ^ 

The contract fbr data archiving should be awarded at the same time the 



ERIC 



TTi; STAGE THREE; CREATING THE DATA ARCHIVE 

in Stage tVo, thref decisions are/made: .1) Whether to atchive; 2) fehat 
to archive; '3^. • hdw^ much pffoft to devote to archiving the data selected. In" 

Sta|e Thr^^ei^ the most important ^ecisloh. t^^^ inadel is^ Who will actually create 

■ • * X ■ ■ ' 

the archive? . - . . 

♦ 

• _ _ _ _ . . - _« 

- For many reasons, data archiving is best performed by an organization separate 

from the organization which initially collected and analyzed the data. Research 

data are, usually collected and analyzed for a specific reason, within a limited 

budget. Therefore, the data cdllectibn and analysis activities focus on specific 

research issues being addressed a^; part of the study. The dbcumehtatibh created 

by the analysis contractors reflects this orientation and is, in general, limited 

. • ■ _ _• _ _■ _ ^ . . _ . ' _ _ ■ _ • _ . _■ ' . " _ .' . ■ 

tdfc the irifbrmatibn needed to. cbmplete the wbrk required by contract. In addition, 

- •- • - ■ ■ . ' - _ _ _ . _ _ ^ ■ _, 

if a prbject is bperatihg bh^ a limited budget^ data dbcumentatibn is. always one 

of the firit areas from which funds are diverted, 'since ±t is usually considered 

a "secondary" product of -the project'^ 

I While -many researchers affiroi the value of gbod data dbcumentatioh, t_he task 

_'_ ._ ^ ■ »S. •_ 

of writing clear user documentation is less interesting to them than data cdilectibh 
and analysis. Often, writing documentation occurs at the close of a project 
when mbst researchers are ready tb start a hew'prbject. Under these cbhditibhs, 

__■ __ . ; '__ & 

the quality of the documentation is likely to suffer. 

The 'mix of skills needed to develop accurate, easy-to-use archive products 
is quite different frbm thbse usually represented bn a researcji team. Writers, 
editbrs, and graphic artists form the core bf people whb can develbp. gcbd 
documentation. Mt&ough a background in soqiai science research is essential 
for writers of .ddQiimentatibn, such writers must be able to create' background 

descriptions bf projects which are sufficiently clear tb inform ah^audiehce 

• . - , ■ ^ 

which feows little or nothing' about the* project, in this base, the writer 

*i3 • . : . L . . . . ^ . : . V . 

is einploying a different set of skills than the original social science 

research aiid analysis team used. • • ; _ 




collection and analyses contract is awarded, or shortly thereafter; It is 

inadvisable to award the archiving contract after a project hasbeeti completed. 

■ . • . . ' ; i _ _ ' • _ 

Initiating archiving activities when the project begins allows direct . 



interaction between the archivists and research staff; An early start insures 
certain advantages: the information about; thg data is still fresh iti the 
researcher's minds; and the research staff is accessible, usually working 
together in one place, ' 

•The archive contract will describS the level of effort that will be devoted 
to archiving. This "level of effort" .refers to the project's wide range of 
alternative activities that can be incorporated into the archiving process. 
In summary, befote the archiving contract can be issued, the following 
questions mtist bes answered. 

*o level of fip.e restrutturing - Will Sierarchicai ffies be rectangularized? 

- ' / ■ ■ ' • ■ 

Will common levels of data be inerged into a single file? 
b Type of data recodihg - Will a consistent set of missing values be used 
throughout all of the 'data sets? Will* consistent coding schemes be * 
used for various questions? 

d Type of dbcumehtatibri - Will- iiew prbjectVarid fil,e dbcuments br biily 

•t]t ■ ■ * 

one of these be written? ^ Will ah index bn^^ta ite^ns be created? 

C. REVIEW OF ARCHIVE DELIVERABLES: * 

- _ _ _ 3 _ _^ ' _ 

- Two kinds bf general preliminary checking prbcedures are apprbpriate fbr 

all archived data: data checks and documentation checks. After the data tapes 

and appropriate* documentation have been delivered to the agency, all data files 

should be subject to preliminary reyiew. a 

i . Tape Review for Aii- Bata 

*^ ■ ■ . ■ . ' 

The tape review entails an initial reading of the tapes to check the 

fbllbwihg items: 

k 

o readability 




} 



o correct and record, counts 

u-': _ ^ . " ^ 

o sample record checks; 

All contractors should provide a printout of the first ten records with the tape 

aiid documentation* 

it is . also recommended that frequency distributions be run on selected 

variables within the data files. The selection of variables may be random 

or in accordance with some other rationale. Each' variable chosen should be 

reviewed for appropriate range. For example, in the case of the variable, 

"religious preference at age 16," legitimate values may range from 100 to 

800- A value of 81 or 1200 would therefore be considered "but of range." This 

is called a range check and involves comparing actual values with legal ranges. 

Variables with out-of-range values cast- doubt on the cdnstructiori of the 

Variable or the respondent's understanding of t^e question . ' | 

2. Checking the Bbcumentatibn / 

\ ^ / . • . 

Documentation checks depend on the level of support given the data fjLle. 

Preliminary documentatich checks ate appropriate for all data filas. 

* 3 • Prelijninary BoctCTxentatibn Checks 

The project officer should make' sure that the archivist has supplied a 
record layfiit (codebctbk) which shows where each data field is Ibcated bn the 
data tape, a copy b£ data cbilectibn instrtments^ and a description of how the 
data files relate to the study. 

. . _. _ _ _ . - - ^ _ _ ' 

The archivist will .provide ptu^ t-level documentation ^including substudy 

descriptions, file-level dbcumentation, machine~readable codebooks, and 

^ ^ -_ __ _/'._'^ ' . _ 

programmer's guides. All of these documents should accbinpany the data tapeki 

The first ^tep in reviewing data dbcumentatioh is to refer to the project- 
level documentation. This document describes the entire itudy, invading majbr 
research questibns, histbricai backgrcand,aad individual substudies, as well 




as other broad information a|»out^Bd^- indJ;?lHual substudies fit into the project* 

2 ^ : 



Variant 
Name: 



Standing 
Inhpuse 



Ad Hoc 
inhouse 
BAAC 

Standing 
Extramural 

Ad Hoc 

Extrainufal 

DAAC 



Continuity 



Accumulated 
Experience 



FIGURE 6 
Comparison of DAAC Models 



Criteria for Variant 



Project 
Officer 
Contact 



i 



Topical 



Data File 
Expertise 



Contact Mith 
Interested 
Publics " 



Tiiae 

Lii&itatlbhs 



50 



ERIC 



